Partitioning the Gov2 Corpus by Internet Domain Name: A Result-set Merging Experiment

نویسندگان

  • Christopher T. Fallen
  • Gregory B. Newby
چکیده

To study the MultiSearch problem and complete the Ad Hoc Task of the 2006 TREC Terabyte Track, the Gov2 collection was divided according to web domain and for each topic, the results from each domain were merged into single ranked list. The mean average precision scores of the results from two different merge algorithms applied to the domain-divided Gov2 collection and a randomized domain-divided collection are compared with a 2-way analysis of variance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Multisearch and Resource Selection for the TREC Million Query Track

A distributed information retrieval system with resource‐selection and result‐set merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track. The GOV2 collection was partitioned into host‐name subcollections and distributed to multiple remote machines. The Multisearch demonstration application restricted each search to a fraction of the avail...

متن کامل

Corefrence resolution with deep learning in the Persian Labnguage

Coreference resolution is an advanced issue in natural language processing. Nowadays, due to the extension of social networks, TV channels, news agencies, the Internet, etc. in human life, reading all the contents, analyzing them, and finding a relation between them require time and cost. In the present era, text analysis is performed using various natural language processing techniques, one ...

متن کامل

IIT at TREC 2004 Standard Retrieval Models Over Partitioned Indices for the Terabyte Track

For TREC-2004, we participated in the Terabyte track. We focused on partitioning the data in the GOV2 collection across a homogeneous cluster of machines and indexing and querying the collection in a distributed fashion using different standard retrieval models on a single system, such as the Robertson BM25 probabilistic measure and a vector space measure. Our partitioned indices were each inde...

متن کامل

Map-merging in Multi-robot Simultaneous Localization and Mapping Process Using Two Heterogeneous Ground Robots

In this article, a fast and reliable map-merging algorithm is proposed to produce a global two dimensional map of an indoor environment in a multi-robot simultaneous localization and mapping (SLAM) process. In SLAM process, to find its way in this environment, a robot should be able to determine its position relative to a map formed from its observations. To solve this complex problem, simultan...

متن کامل

Distributed Naming System for Mobile Ad-Hoc Networks

Mobile ad hoc networks (MANETs) are selforganized networks that are envisioned to deploy with zero (or less)-configuration efforts in places where infrastructure networks are not available. In such a network, connectivity is enabled by automatic IP address allocation and dynamic routing. MANETs are highly dynamic with members joining and rejoining, leaving, moving around freely, network partiti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006